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Executive Summary 

Prompted by the recent growth in the proportions of minority entrants from Community 
Colleges (CCTs) into the SUS, and an earlier scan which suggested consistent bias 
against females and minorities on entrance examinations, the current study analyzed 
over one million SUS applicants to determine whether any biases in entrance test scores 
tended to favor either whites over minorities or males over females. 

Background 

Today, few guestion the economic value of higher education to individuals. Day & 
Newburger (2002, p. 2) show that individuals holding an associate's and a bachelor's 
degree average respectively 26% and 72% more annual income than those with a high 
school diploma. Therefore, anything that interferes with an individual's opportunity to 
attain a degree costs both individuals and society (in the form of tax dollars and well 
documented societal benefits, Wellman, 2002). Between 1996 and 2003, the Florida 
SUS experienced a 33% increase in new undergraduate matriculants (70% among 
FTICs) during a time when state funding in absolute dollars increased by 39%, but in 
constant HEPI dollars grew by only 11%. Faced with rising student numbers and fewer 
real dollars to serve each student, SUS schools appear to have chosen to reduce access 
more than service. Therefore, entrance reguirements became more stringent as 
institutions attempted to reduce the size of their matriculating FTIC cohorts (Vogel, 
2006). Usually, in higher education, more stringent entrance reguirements translate to 
higher entrance test score reguirements. Supporting this, during the time in guestion, 
minority native (FTIC) matriculants to SUS institutions, despite rapid growth among 
Florida high school graduates, remained flat (moving from 36.4% to 38.0%), while their 
entrance rates as CCTs increased by 30% (from 27.8% to 37.2%). Female representation 
among CCTs also grew more than 2.5 times as much as it did among FTIC students. 
Florida community colleges have an open door policy for Florida high school graduates; 
thus, tests play no part in admissions. 

A substantial literature documents problems for the above groups on standardized tests 
such as the ACT and the SAT (Micceri, 2001, lists and reports on a large number of 
these) . Such measures were initially championed by the American Eugenics Society to 
"purify" the race of low-grade and degenerate groups such as minorities and the poor 
(Gibson, 200 1). Despite thousands of studies conducted by higher education institutions 
and many others supported by testing services, it is almost impossible to find 
relationships between entrance examinations and any outcomes other than first 
semester grades, which is the only outcome to which test makers themselves claim their 
product relates (Fairtest, 2006; Elert, 1992). First semester grades fail to provide any 
indication of long-term retention or graduation (Micceri, 2001). The current study 
sought to address the research guestion in a more through fashion, and from a more 
valid research perspective than most historical work on the topic. 

Methods 

This study addressed the following research guestion: "Do any consistent score 
differences occur on Standardized Tests between different sexes or race/ ethnicities who 
exhibit the same historic academic performance as measured by high school GPA?" 
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Using data from the Florida SUS Admissions files for all First Time In CriUege (FTIC) 
applicants to aH eleven SUS institutions for the Academic Years 1997-98 through 2005- 
06, mean test scores (SAT, Quantitative &Verbal Subscores, ACT) were computed for 
each subgroup at the 1/ 10^^ of a point high school GPA level (e.g. 3.6 is one group, 3.7 
another) . If tests measure what they purport to measure, academic performance, one 
would expect only random deviations to occur between groups who exhibit almost 
identical academic performance as measured by GPA, a reliable, multi-year, multi- 
source estimate of academic performance. Consistent deviations from die expected 
random must therefore reflect bias. Subgroups included females and males within each 
major racial/ ethnic subgroup: African American, Asian, Hispanic, white and Other. 

Results and Discussion 

Among the 1. 1 million student scores submitted to analysis at the same high school GPA, 
vdiere one would expect group means of large groups (smallest being over 500) to 
cancel out and produce random effects (50% higher, 50% lower), whites consistently 
outscored minorities by an average difference of over 60 points on the SAT. ^ Whites 
outscored every minority group. An even greater discrepancy occurred between males 
and females, with males holding an average advantage over females of 75 points on the 
SAT at the same high school GPA. Considering that ETS (2001) claims a 60 point 
difference shows a ". . .true difference in ability," and since females outperform males on 
every academic performance measure one can choose (graduation, grades, etc.), and in 
every discipline, even Engineering (Mcceri, 2005, p. 10), one must ask: "A true 
difference on what?" Apparently, a "true difference" in the ability to score high on 
multiple- choice, standar^zed tests. 

The consistent advantage that whites and males have on such tests suggests a 
discriminatory bias that ultimately translates into economic, social and status 
advantages for vdiites and males over minorities and females. This also assures that 
faculty in universities remain primarily white and male, because most university faculty 
earn tiieir doctorates at flagship universities, and elite doctoral programs tend to prefer 
applicants from other elite programs, who tend to use test scores as entrance 
reguirements. The effects of the discriminatory bias is well documented, most recently 
by Haycock & & Gerald (2006, p. 3), vdio state about public flagship institutions: 'Elven 
as the number of low- income and minority high school graduates in their states grows, 
often by leaps and bounds, these institutions are becoming disproportionately vdiiter 
and richer." 



1 Because some 65% of SUS applicants take the SAT, discussion is limited to the SAT total score for simplicity's 
sake. Similar effects occurred for all scores and subscores in all analyses. 
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How We Maintain the Academic Status Quo Through the Use of Biased 

Admissions Requirements 

This paper attempts to show how an trend that began during World War II (WWII) 
helps ensure that society's upper class of wealth and power gains advantages when it 
comes to enrolling in Florida's public State University System (SUS) institutions. Higher 
Education has historically been, at least untU the proletariat revolutions of the 20^^ 
Century, the exclusive territory of the aristocracy and wealthy. A place they could send 
their young to get away from the dangers of interacting with city workers^ and thereby 
gain traditional cultural knowledge/ capital. For example, between 1890 and 1900, fewer 
than 5% of Americans aged 18 to 21 years attended higher education institutions (Goldin 
&Katz, 1999, p. 41). By 1970, this percentage had risen to 70%, meaningthat large 
numbers of "commoners" were rubbing elbows with the elite, partly because of the 1965 
Higher Education Act vdiich stated that colleges couldn't turn away applicants merely 
because their families were poor. Such interaction with society's riffr^ had traditionally 
been avoided among the aristocracy through the use of exclusive (expensive) private 
colleges. Theoretically, public universities, vdiich are paid for by the taxes of all, should 
offer even those of the lower classes the opportunity for higher education. However, in 
the latter part of the 20* Century and the early years of the 21^ Century, as the United 
States experiences increasing w^th disparity (Sahadi, 2006; Witte & Henderson, 

2004), we see a disturbing tendency to exclude "commoners". Haycock & Gerald (2006, 
p. 3) state, regarding public flagship institutions: 'Elven as the number of low- income 
and minority high school graduates in their states grows, often by leaps and bounds, 
these institutions are becoming disproportionately whiter and richer." The primary tool 
used by higher education to discriminate against commoners is a selection bias that 
results from the use of stringent admissions recjuirements (read higher test score 
recjuirements). In the Florida State University System (SUS) the "more stringent 
admissions" arguably resulted from a 70% increase in First Time in College (FTIC) 
matriculations during a time when constant dollar funding only increased by WA? ( 1996 
to 2003). Although standardized tests were initially pushed to the fore by Connant as an 
Admissions requirement in an attempt to reduce class bias, in reality, they perpetuate 
the class bias effect, as this study wiU show. Stimulated by earlier research conducted in 
an attempt to understand the recent movement of underprivileged minorities from 
direct entry into the Florida SUS to transfer from community colleges, in addition to 
Gibson's (200 1, p 1.) claim that: ". . .The SAT measures, above aU else, class, sex, and 
race. . .", this study addresses the research question: Do any consistent score differences 
occur on Standardized Tests between different sexes or race/ ethnicities for students 
exhibiting the same historic academic performance levels as measured by High School 
Grade Point Average (HSGPA)? 



2 Thus, most major state public universities locate in small, comparatively affluent towns like Albany, Austin, and Athens rather 
than New York City, Dallas or Atlanta. 

^ Using the Higher Education Price Index (HEPI), to provide a more realistic adjustment estimate than that of the Consumer Price 
Index (CPI). 
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Background 

Between 1996 and 2003, the Florida Public State University System (SUS) experienced a 
70% increase in new First Time In College (FTIC) students attending one or more of the 
institutions, and a total increase from 54,000 to 72,000 (33.3%) among new 
undergraduates annually. During this same period of time, the total amount of money 
allocated to higher education by the State to higher education in Florida increased by 
39.3%, which appears adeguate. However, if one adjusts for inflation using HEPI, this 
represents only an 11.2% increase (SHEEO, 1997; Palmer, 2004; Crimmonfimd, 2006). 
Thus, Florida SUS institutions were faced with the choice of either reducing service or 
reducing access and they apparently chose the latter. As a result, it has become 
increasing difficult for high school students to gain admission to an SUS institution 
(Vogel, 2006). Duringthis time, minority enrollment directly from high school into the 
SUS 4-year institutions remained relatively flat, growing by 1.6 percentage points 
( 36.4% to 38 . 0 %), despite their increasing percentages in the overall Florida population 
and among high school graduates (CDC, 2006). At the same time, their entry to the SUS 
as Community College Transfers (CCTs) increased by 9.4 percentage points (27.8% to 
37.2%). While African American representation among FTIC students remained the 
same between 1996 and 2003 ( 18.3% and 18.4%), they showed a 30 percent 
representation increase among CCTs, from 9.8% to 12.7%. Students classified as "Other" 
more than doubled in CCT representation, from 2.9% to 6.2%, while Hispanic students 
increased from 15.1% to 18.3%. Females remained more stable in both populations, 
however, they too showed a 2.5 times greater increase among CCT students (2.4%) than 
FTIC students (0.9%). 

This increase in minority and female percentages among CCT transfers prompted an 
investigation of possible bias in admissions that might adversely influence minorities 
and females entry to SUS institutions. A guide analysis of 164,378 SUS fall FTIC 
applicants (200 1 to 2003) comparing mean SAT scores between vdiites and minorities 
and between sexes at each 10th of a HSGPA point across radal/ ethnic groups and sex^ 
showed a consistent bias favoring whites and males. In 181 of 189 race/ ethnidty 
comparisons whites scored higher than the comparison group. At the same GPA point, 
for example, 3.6 or 3.7, the overall mean difference between whites and minorities 
averaged 62 points across years. There was an even more extreme bias favoring males 
over females than vdiites over minorities. Males exhibited higher mean SAT scores than 
females in all 63 comparisons with an average advantage of 75 points across years 
(Borman, Workman, Miller &Micceri, 2006). On the topic of differences between 
individuals ETS (2001) states: "The user can be reasonably confident that a score 
difference of around 60 points or more indicates a true difference in ability between two 
test takers." This should be more true of group means than individual scores. Therefore, 
if a "true difference" occurs between males and females exists, one must ask: what "true 
difference," because females outscore males in performance everywhere in education. 
Thus, it can't be academic performance. This background and findings prompted the 
current, broader and more intensive study. 



4 In this paper, the correct term sex will be used in place of the common, but erroneous term gender. Gender is a strictly linguistic 
term having three groups in English: masculine, feminine and neuter. People are male or female, not neuter. 
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The History of Standardized Tests as Admissions Criteria 

First, it is useful to understand how standardized tests came to replace the traditional 
essay as a primaiy selection tool for college admissions. Among the first to push for the 
use of IQ tests as criteria for officer selection in the military were Terman and Yerkes, 
executives in the American Eugenics Society (AEG), who wanted to purify the race of 
low-grade and degenerate groups such as minorities and the poor (Gibson, 2001). From 
an opposite perspective, following the research of the 1930s and 1940s which showed 
genius appearing among aH classes, J ames Biyant Conant, at the time. Harvard's 
president, believed that in the half century leading up to 1940, the U.S. had gone from 
being a classless society to one that was falling under the control of a hereditaiy 
aristocracy. Conant hoped to use the SAT to select students for college who had virtue 
and talent, assuming that the two went hand-in-hand. Regarding the SAT he wrote: 

". . . we have before us a new type of social instrument whose proper use may be the 
means of salvation of the classlessness of the nation..." Conant hoped to use this new 
social instrument (the SAT) to sort and slot the entire populace on the basis of their test- 
defined intelligence in the name of creating a perfected, classless and democratic 
America modeled on Plato's Republic. 

Wars provide opportunity for restructuring society and Conant, realizing this, moved 
guickly to establish the current testing regime after WW II began. J ust after Pearl 
Harbor, he replaced the old essay tests for college admission with the SAT for aU 
applicants at Harvard. In 1943, a revised SAT was administered to more than 300,000 
people nationwide for officer- selection purposes. Immediately after the war, Conant, 
through an adept series of bureaucratic maneuvers, arranged for aU the leading 
education tests and testing organizations in the county to be merged into a new, 
private, non-profit entity that would effectively hold a monopoly in the field, the 
Educational Testing Service (ETS). Because the service is private, it does not answer to 
the populace, as do government- run testing organizations in most countries. Because 
these aptitude tests were free to colleges, represented far less work on the colleges' part 
than traditional essays, and were thought to be objective and valid, they spread rapidly 
as a college selection criterion, aided by ETS' nationwide marketing. Conant's purpose 
was to create an objective measure that could identify his aristocracy of virtue and talent 
even when located in poorer communities (Lemann, 1999a, p. 53). Unfortunately, these 
tests have not proven objective and precisely the opposite of what Conant hoped has 
turned out to te true due to a consistent bias favoring affluence. As renowned 
demographer Harold Hodgkinson (Hodgkinson, 1999) states: "SATs predict one thing 
beautifully, but it's not the grades students will earn as freshmen; it's the household 
income of the test- takers. For every $ 10,000 increase in household income, math and 
verbal scores go up a minimum of nine points." One might ask how this could occur for 
measures that purport to be objective. One primaiy factor is a set of biases in the tests 
that consistently favor the affluent over lower- income individuals. 

Biases Common to Ait Forms of Standardized Tests 

Although many mistakenly think them objective, standardized tests exhibit numerous 
types of bias against specific subpopulations of test- takers. As a result, tests consistently 
underestimate specific groups' knowledge or performance. In measurement, biases arise 
in some curious ways. For example, vdien researching his non-language IQ tests. Raven 
( 1939) discovered tliat the physical location of correct and erroneous answers 
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consistently influenced (biased) student errors. The following are a few of the most 

obvious and common types of bias inherent to standardized tests. 

1. Students from households using non- standard En^ish face a consistent 
disadvantage. No matter what their racial/ ethnic group, poorer people in the United 
States tend to grow up in such environments. As an example of how such biases can 
lead to totally erroneous conclusions, early in the 20th century, when widespread 
testing began, immigrants from Poland and Italy were considered stupid and inferior 
because they scored poorly on tests in their second language, English. 

2 . Regarding cultural biases, test creators tend to grow up in middle- class to upper 
class American households. The test items such individuals write tend to reflect their 
personal and cultural experiences. Those from poorer households, different 
countries, cultures or religions freguently lack experience with the topics discussed 
on tests. This type of bias first became apparent in the 1890 's when researchers 
suggested the use of field trips to the country for city public school kids because city 
kids scored lower than country kids on standardized tests. Most of the test makers at 
that time had grown up on farms, and city kids knew little about animals or farming 
environments, which were common topics on tests created by former farm kids. As 
an Australian once said: "If an average American were to take a test of reguired skUls 
that was created by a Bushman, they would test at the moron level." 

3. The great majority of high-stakes and standardized tests are speeded. This tends to 
work against those who are perfectionists, those who process information slowly, 
and against those who are less likely to "take risks" (probably more generally true of 
females than males) . Many people wiU not put down an answer until they are 
positive it is correct (perfectionists). Obviously, in a speeded test, such individuals 
have trouble answering enough guestions to attain a high score. Additionally, some 
process information more slowly than others. In most life situations other than 
driving race cars, this is not very relevant. However, when taking a timed test, it 
negatively biases estimates of knowledge, intellect and talent. 

4. Large numbers of students today suffer from test anxiety. This begins to show in 
about the third grade and may cause a student to perform poorly on tests. Prior 
performance being an influence in test anxiety, students who are subject to items #1, 
#2, #3 or #6 will tend to perform less well and therefore be subject to greater test 
anxiety, which may reduce their scores. 

5. Standardized tests and almost all tests of any type tend to heavily reward short-term 
memory skills. Although this is a useful skiU to have, it is far from the only skiU 
needed when attempting to solve problems either in college, or in the real world. 
Almost all tests are biased against those who lack strong short-term memory skills 
no matter at what level their long-term memory skills may be. 

6. A fairly large proportion of the population has what is termed a learning disability by 
the educational community. This may represent any of a number of different ways of 
looking at the world, but in most cases, these different perspectives or methods of 
processing information, associate with poorer performance in school and almost 
always with erratic test performance, itself an indicator of a learning disability. 

7. As a rule, tests reflect a very limited perspective on a single type of intellectual 
process that may be termed logical/ anal^^cal, and which rarely reguires any higher- 
order thinking by a respondent. Performance in school is also typically evaluated 
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using tests of similar level, although weU- trained teachers use many other sources of 
information when assigning grades. Unfortunately, many are not weU trained in 
measurement. As a result, it is not uncommon for those who think differently, and 
who may be the greatest geniuses among us, to not exhibit this specific type of 
intelligence as measured by either a teacher's instruments in schools or standardized 
tests (This bias freguently relates to #6, the learning ability/ disability issue.) . 
Robbins ( 1987) notes some well-known examples from history: 

Albert Einstein's parents were sure he was retarded because he spoke 
haltingly until the age of nine and even after that would respond to 
guestions only after a long period of deliberation. He performed so badly 
in his high school courses, except mathematics that a teacher told him to 
drop out, saying, 'You will never amount to anything Einstein." Charles 
Darwin did so poorly in school that his father told him, 'You wiU be a 
disgrace to yourself and aU your family." Thomas Edison was called 
"dunce" by his father, "addled" by his high school teacher and was told 
by his headmaster that he "would never make a success of anything." 

Henry Ford barely made it through school with the minimum grasp of 
reading and writing. Sir Isaac Newton was so poor in school that he was 
allowed to continue only because he was a complete flop at running the 
family farm. Pablo Picasso was pulled out of school at the age of ten 
because he was doing so badly. His father hired a tutor to prepare him to 
go back to school, but the tutor gave up on the hopeless pupil. Giacomo 
Puccini, the Italian opera composer, was so poor at eveiy^ng as a child, 
including music, that his first music teacher gave up in despair, 
concluding that the boy had no talent. 

Several of the forms of bias noted above, tend to influence less affluent groups most, 
which include most minorities. For example, 2005 median incomes for males were: 
Other, $27,041, Hispanics, $27,380, Blacks, $34,433, whites, $46,807 and Asians, 
$48,693 (Webster & Bishaw, 2006) . Many minority individuals, particularly those in the 
lower socio-economic classes, rarely experience Standard En^ish in their homes or 
communities, and freguently not even in the low-performing schools they attend (realize 
that the definition of low-performing almost always results from standardized tests that 
are language biased against the enrolled students and freguently the teachers as weU) . 
We can probably assume that the long-term gains on standardized measures shown by 
less affluent minorities in research such as that on small class size effects result largely 
from improved reading sMlls in Standard English (Mosteller, 1995; lUig, 1996). 

FairTest (2006) documents a purposeful use of bias #2 above regarding scores on the 
SAT: 'The SAT is designed solely to predict students' first year college grades. Yet, 
despite the fact that girls earn higher grades throughout both high school and college, 
they consistently receive lower scores on the exam than do their male counterparts. In 
1994, girls averaged 41 points lower than boys on the Math section of the test, and 4 
points lower on the Verbal section. 



^ The reason these differences are lower than those reported in the current study is that the analyses conducted here control for 
HSGPA. Because girls consistently earn higher grades than boys, comparing the average girl’s test scores to the average boy’s 
effectively compares those having a 3.63 mean HSGPA (females) with those having 3.46 (males) for 217,743 SUS FTIC 
enrollees between summer 2000 and spring 2006. 
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The article reports that the sex gap favoring boys persists across all other demographic 
characteristics, including family income, parental education, grade point average, course 
work, rank in class, size of high school, size of dty, etc. A study by PhyUis Rosser ( 1989), 
The SAT Genders Gap: Identifying the Causes, found that the vast majority of questions 
exhibiting large gender differences in correct answer rates are biased in favor of males, 
despite females' superior academic performance. Rosser found that girls generally did 
better on questions about relationships, aesthetics and the humanities, while boys did 
better on questions about sports, the physical sciences and business. 

This conclusion is supported by an earlier study by ETS researcher Carol Dwyer, vdio 
provides some historical perspective on the gender gap in her 1976 report. She notes 
that it is common knowl^ge among test- makers that gender differences can be 
manipulated by simply selecting different test items. Dwyer cites as an example the fact 
that, for the first several years the SAT was offered, boys scored higher than girls on the 
Math section but girls achieved higher scores on the Verbal section. ETS policy-makers 
determined that the Verbal test needed to be "balanced" more in favor of boys, and 
added questions pertaining to politics, business and sports to the Verbal portion. Since 
that time, boys have outscored girls on both the Math and Verbal sections. Dwyer notes 
that no similar effort has been made to "balance" the Math section, and concludes that 
"It could be done, but it has not been, and this suggests that either a conscious or 
unconscious form of sexism underlies this pattern. When girls show the superior 
performance, 'balancing' is required; when boys show the superior performance, no 
adjustments are necessaiy (Eairtest, 2006)." 

Such information calls into question both the motives and trustworthiness of test 
developers, to whom we, as a nation are entrusting hundreds of millions of dollars each 
year plus the future of our youth. That group also reports research showing that other 
biasing elements consistently work against females, including, but not limited to: the 
multiple- choice format itself, speededness, and corrections for guessing. 

In view of the preceding, it is not surprising that certain groups, particularly females, 
the less affluent and second language learners, score consistently lower than other 
groups. Thus, historical evidence suggests that such measures, rather than being 
objective estimates of talent or knowledge, are almost exactly the opposite. 

The Lack of Relationship Between Admissions Tests and College Performance 

Below is a summary of results from a typical sample of the hundreds of studies 
conducted at various colleges and universities regarding the relationship between prior 
academic variables (GPA and tests) and college performance. Such studies overall 
findings may be summarized for both undergraduate and graduate students by the 
following statement: 

Relationships between tests and any performance measures in college becomes 
essentially zero when one controls for more important and predictive factors 
such as prior performance (GPA), affluence (which may be imprecisely 
estimated by part-time/ full-time enrollment) and sex. 



6 The word sex is generally used in this document rather than gender, because gender is a strictly linguistic term 
consisting of three categories in English: masculine, feminine and neuter. Having never met either a feminine or a 
neuter, this author prefers to use sex: male or female. 
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Even test makers themselves do not claim that standardized tests measure either 
achievement or school outcomes (Bracey, 1997). Test makers such as ETS and ACT do 
claim a low-level relationship with first semester college grades (Murphy, 2000; Elert, 
1992). 

Several studies suggest that test score relationships with performance, as measured by 
College GPA, differ between male and female students (Tusue & Whitaker, 1999; House, 
1998; House, Gupta &Xiao, 1997), and between majority and minority students (Bieker, 
1996; Lindle & Reinhart, 1998), or are influenced by other historical characteristics such 
as parental education (Strieker, &Rock, 1993). 

Many studies report positive simple relationships (correlations) between tests (ACT, 
SAT, GRE or GMAT) and first semester grades in college, either graduate or 
undergraduate. However, even these relatively unimportant relationships are usually 
guite low generally ranging between r = . 15 and r = .25. Elert ( 1992) summarizes 
numerous studies on SAT's relationship with first semester grades: "The best known 
record of prediction by the SAT, reported in a 1978 ETS survey of studies, was at a New 
J ersey college where tire 1978 SAT-Verbal would have been matched as a predictor by 
random chance only 59% of the time. The worst result was reported at a university in 
Indiana where chance would have predicted grades as well as the 1972 SAT-Verbd 
99.96% of the time (Breland &Minsky 149, p. 153)." A few other related studies include: 
Astin, 1993; Paszczyk, 1994; MSCHE, 1997; Chernyshenko, et al; 1999; House, Gupta & 
Xiao, 1997; Bieker, 1996. 

Regarding this, Alexander Astin: "In a very practical sense, the student's ability to stay 
in college is a more appropriate measure of his success than is his freshman GPA. 
Although it is true that good grades will help him gain admission to graduate school, to 
win graduate fellowships, and even to secure certain types of jobs, they are irrelevant to 
any of these outcomes if the student drops out of college before completing his degree 
reguirements." Astin found that using SAT scores to predict who will graduate resulted 
in 3.2% prediction for men and 2.9% for women. This means that for over 97% of the 
cases, random selection would predict the odds of remaining in school as well as the 
SAT. "Whether or not the student will drop out of college after the freshman year," Astin 
noted, "can be predicted with only a low degree of accuracy" (dted in Elert, 1992) . 

Elert ( 1992) supports the findings of many studies by stating that previous grades are 
about twice as good as the SAT at predicting academic achievement as measured by first 
semester grades. The main justification for reguiringthe tests for admissions is that 
although the SAT is an inferior predictor relative to high school grades, it can increase 
the accuracy of prediction when used in combination with them. However, research 
indicates that inclusion of the SAT increases early grade prediction by an average of only 
5%. The major reason that the benefits are so low is that the SAT provides redundant 
information. Elert (1992) notes: 

Marginal as they are, the predictions of first year grades are the test's most 
accurate forecasts. Correlations between scores and grades in later years, and 
overall college average, are lower stiU. One study found that the ability of 
college admission tests to predict grades dedin^ consistently from one 
semester to the next throughout eight semesters (Humphreys) . The virtual 
disappearance of the aptitude tests' ability to predict beyond the freshman 
year has been explained by some commentators as a result of the nature of 
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advanced study. Multiple choice testing dominates introductory courses, they 
argue, but intermediate and advanced courses demand a broader range of 
performance. 

Crouse &Trusheim (1988) conducted a detailed statistical analysis of the SAT's 
predictive shortcomings. Using data from the National Longitudinal Study (NLS) of the 
high school class of 1972, they calculated the number of additional correct admissions 
using high school rank (HSR) alone and with the SAT. With four different measures of 
undergraduate success, they calculated that using the SAT in admissions adds between 
0 . 1 and 2.7 additional correct forecasts per 100 applicants. 

Many other studies, a few of which are noted, report one or more of the following: ( 1) 
tests fail to predict a more pertinent measure of success in college than first term grades 
(retention or graduation), (2) tests relate negatively with time- to- degree (higher scorers 
take longer to graduate) or (3) that tests' prediction capacity pales in comparison with 
other key variables (Bangura, 1995; Adelman, 2006; Waugh, Micceri &Takalkar, 1994) 
or graduate level (Xiao, 1998; Onasch, 1994; Sternberg & Williams, 1997; Wright & 
Palmer, 1994; Morrison & Morrison, 1995). 

The Florida State University System (Florida SUS, 1995) which consisted at that time of 
nine universities (UF, FSU, FAMU, USF, UCF, FAU, FlU, UNF, UWF) conducted an 
extensive retention/ graduation study that prompted the setting of sliding scale 
admission standards. Their study showed that once an applicant has a High School GPA 
of 3.0 or higher, test scores bear no relationship to retention/ graduation. Belowthat 3.0 
cut off, it may be worthwhile considering a student for UF or FSU if they show high test 
scores. The following guotes from (Florida SUS, 1995) speak to the issue of tests and 
admissions to college from the perspective of Florida in their first paragraph and from a 
literature review in their second: 

The greatest single predictor of success in College is High School Grades. 

Two methods are most freguently used to guantify high school 
performance: High School GPA and class standing or rank. Class standing 
is reported as a standard more often than is GPA. 

To give some idea of how extensive such research has been, among the highest 
relationships in the testing literature between the SAT and anything other than another 
such test is a correlation of r = .66 reported by French &J ohn ( 1967) between SAT 
scores and uric add levels in the blood. Such FTS- sponsored studies show just how 
widely FTS has searched for something their product predicts empirically. 
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Methods 



This study addressed the following research question: 

Do any consistent score differences occur on Standardized Tests between different 
sexes or race/ ethnicities who exhibit the same historic academic performance as 
measured by HSGPA? 

To assure adequate sample sizes in each cell, data from the Florida SUS Admissions files 
for all First Time In College (FTIC) applicants to all eleven SUS institutions for the 
Academic Years (AY) 1997-98 through 2005-06 were used to address the research 
question. 

Variables and Data Analysis 

High School GPA - HSGPA values are those reported in Admissions FUes by SUS 
institutions. Students in high school obtain extra GPA points for taking AP, IB and 
Honors courses, thus the possible range of GPA values is from 0 .0 to 5.0 . For these 
analyses, in order to assure adequate sample sizes in each cell (Table 1), the GPA range 
was limited to 2.5 to 4.5 because some groups become comparatively rare at certain GPA 
levels (for example, Asians below 2.5 and most groups above 4.5.). 

Race/ Ethnicity - Self-reported classification obtained from SUS institutions. Analyses 
were limited to Asian, Black, Hispanic, Other and white, non- Hispanic. Other includes 
aU students (American Indian, Other, etc.) who were not classified as unknown. Those 
classified as unknown were excluded from analysis. 

Sex - Self-reported classification obtained from SUS institutions. Students were 
classified as male or female, all unknown cases were excluded from analysis. 

Tests - Four different tests and subtests were submitted to analysis: 

• SAT Total Scores 

• SAT Quantitative Subscore 

• SAT Verbal Subscore 

• ACT Composite Score 

The Florida SUS Admissions FUes - The Florida Master Admissions Databases are housed at 
the Northwest Regional Data Center in Tallahassee, Florida, and are part of the Florida 
Department of Education. The Florida SUS Master Admissions Files provides relevant academic 
and demographic data for all applicants to any SUS institution. 

GPA values were rounded to the nearest of a point and mean test scores for the four 

tests were computed for each race/ ethnic subgroup and for both sexes separately for aU 
students within each group at each HSGPA level. SAS version 9.1 was used to compute 
the subgroup score values, and between group differences were obtained using 
Microsoft Excel 2004. Charts were created using Microsoft Excel. 
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Limitations 

It is possible that GPA values differ from one school or school district to another. The 
possible influence of such effects on findings was addressed by assuring large samples 
for each cell (smallest among the major groups was 96 among Asians at 2.5 on the ACT). 
Large samples should weaken such effects on analyses. 

One might ask the guestion of whether there should be a relationship between HSGPA 
and tests. This is the only claim regarding the validity of these tests for college selection 
that the test makers themselves assert (Fairtest 2006). Further, as Table 1 and Figure 1 
show, there is a consistent monotonic relationship between HSGPA to the tenth of a 
point and test score values that remains invariable across aU tests and all 
race/ ethnidty/ sex subgroups. 



Results and Discussion 



Sampie 

The total sample of FTIC students included 1,094,414 cases, of whom 698,054 had SAT 
scores and 396,360, ACT Scores. Including only larger cell sizes (the smallest cell size 
for the SAT was over 530 cases, and for the ACT, 96 cases), by limiting GPA values of 2.5 
to 4.5 reduced the total sample of SAT scores to 628,946, and ACT to 358,586 (Table 1). 

Reiationship between Mean Test Scores and HSGPA 

Figure 1 depicts the monotonic nature of the relationship between GPA values at the lO^i 
of a point and SAT total scores. Table 1 shows that for aU groups and subgroups 
considered in these analyses, mean test scores increased monotonically as mean HSGPA 
increases. The table shows that for Other students the same consistent differences tend 
to occur and the same monotonic relationship between GPA and test scores. Because the 
test point increments are small ( 1/ lO^^ of a point), and given the monotonic nature of 
these variables relationship, if the test scores were unbiased, one would expect a 
relatively random distribution of means between and across groups for a given HSGPA 
score (e.g. 3. 1) . Thus, any group should have an egual chance of scoring higher or lower 
than any other group at a given GPA point. However, the results below show consistent 
differences favoring males and whites at almost all GPA scale points. Further, the mean 
differences between groups range from lows of about 20 points on the SAT, to highs of 
about 140 points. As was noted earlier, ETS (200 1) suggests: 'The user can be 
reasonably confident that a score difference of around 60 points or more indicates a true 
difference in ability between two test takers." 
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Figure 1 

Monotonic Reiationship Between Test Scores and High Schooi GPA (N=628,948) 



Mean SAT Total Scores by GPA and Race/Ethnicity 




■«— White 
^ Asian 
^ Hispanic 
^ Black 
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Mean Scores on SAT and ACT Tests 



Asian Black 

Female Male Female Male 

N Mean N Mean N Mean N Mean 



Table 1 

for All FTIC Applicants, Summer 1997 through Spring 2006 



Flispanic Other 

Female Male Female Male Femal 

N Mean N Mean N Mean N Mean N Tl^ 
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Female Male 

N Mean N Mean 
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GAPS Become Smaller as GPA Values Increase 

Figure 2 shows that as GPA values increase, the advantage favoring whites and males tends to 
become smaller. In fact, at the 4.3 GPA level, Asian students begin scoring higher than white 
students. However, given the 2001 statement by ETS about how a difference of 60 points 
represents a "true" difference in ability, one must realize looking at Figure 2, that if that 
statement is in fact accurate. Black and Female students always have "less" ability than whites 
and males, and that only at score points of 4.3 to 4.5 do Hispanic students have no "true" 
disadvantage in ability. 

Figure 2 

Racial/ Ethnic and Sex Advantages of Whites and Males by GPA Category - 
Mean of SUS Fall Annual FTIC Cohorts 1997-98 through 2005-06 



Mean FTIC Applicant SAT Score Difference from White and Male by 

GPA - (N=628,948) 




— ^Black 
Female 
—^Hispanic 
Asian 



Quantitative and Verbai Subscore Differences 

Figure 3 displays SAT quantitative and verbal subscore differences between whites and 
minorities and males and females. ETS (200 1) notes that differences of 30 points on the 
subscores, although less reliable than the total score, represent "real" differences in ability. A 
few rather interesting differences occur between these two subscores. First, on the Quantitative 
portion (top panel), Asians almost always score as high as or higher than whites. Second, 
regarding verbal differences (bottom panel), three groups usually score near or below the 30 
point difference criterion of "true" differences: Females, Asians and Hispanics. Finally, females 
always show only a 20 point or lesser disadvantage on the verbal subtest, but a 50 to 60 point 
disadvantage on the quantitative subtest. 





Summary, Conclusion and Implications 

Detailed analysis in this discussion was limited to SAT test scores, although similar consistent 
differences occurred for ACT tests as well. The following points appear to be consistent in these 
analyses: 

• All groups and subgroups show a monotonic relationship between HSGPA values to the 
IQth of a point and test score values. 

• Given that one should expect a random distribution of differences at the same lO^i of a 
GPA level, these analyses indicate that consistent biases favoring vdiites and males 
result from the use of standardized tests as admission reguirements. 

• Usually the gaps favoring vdiites or males are well above the 60 points that ETS (2001) 
states reflects a "true" difference in ability. This "true" difference occurs where minority 
and female students are exhibiting exactly the same level of academic performance as 
measured by GPA (actual performance) . 

• The gaps favoring whites and males decrease somevdiat as GPA values increase. 

One vital point regarding all of these analyses is that the GPA is the criterion of success in 
school, vdiether high school or college. Standardized test scores, no matter how high, do not 
earn a degree, the certificate verifying academic success or failure. Thus, GPA must be 
considered the "true" score, with test scores merely a theoretical proxy. Secondly, it is 
important to realize that GPA tends to be a very reliable estimate of academic performance and 
certainly, next to affluence, the strongest predictor of success in college (Florida SUS, 1995; 
Mortenson, 1999, 2000). A students high school or college GPA derives from multiple 
observers (teachers), in multiple disciplines, over an extended period of time (three to six years 
for college) . As a result, this tends to produce a considerably more valid and reliable estimate of 
academic performance than the point- in- time, strictly abstract estimate that tests provide. 

In summaiy, these results are consistent and appear clear. This is particularly true when 
viewed within the context of the literature, in that a specific bias favoring whites and males 
results from the use of standardized tests as admissions criteria. Further, this appears to be a 
factor, if not a major factor in the recent movement of minority students from entry into SUS 
institutions as FTIC students to entry as CCT students because the Florida Community 
Colleges reguire only a high school diploma as a criterion for entry as a degree- seeking student. 

The apparently increasing use of test scores as a criterion for entry to higher education appears 
to produce both an unjust and socially unwise bias favoring males, whites and the affluent 
(Heller & Rasmussen, 2006) . This reflects poorly on the integrity of Florida's SUS in that 
institutions continue to perpetuate discriminatory technigues that reduce the access of 
minorities and females to the more prestigious academic pipeline. The problem is exacerbated 
as one moves up the ladder of prestige and its effects are well stated by Haycock & Gerald 
(2006, p. 7): "...the 50 flagship universities now look less and less like America— and more and 
more like 'gated communities of higher education.'" Further (p. 5): ". . .the highest achieving 
students from high- income families- those who earned top gr^ades, completed the full battery of 
college prep courses, and took AP courses as weU-are nearly four times more likely than low- 
income students with exactly the same level of academic accomplishment to end up in a highly 
selective university." These recent findings extend the research reported by such as Mortenson 
( 1990) . The current study indicates that one reason this occurs is the use of test scores that are 
biased against minorities and the less affluent. The use of tests is influenced by rankings like 
U.S. News and World Reports 'America's Best Colleges," with was specifically designed to sell 
magazines. Haycock & Gerald note (p. 3), the result is: "Rated less for what they accomplish 
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with the students they let in than by how many students they keep out many of these flagship 
institutions have become more and more enclaves for the most privileged of their state's young 
people." Of course, most faculty come from the elite colleges, thus perpetuating the historic 
academic status quo of white male faculty. 

A Call for Change 

Study after study has shown that neither for graduates nor undergraduates do standardized 
tests provide any useful information beyond that provided by GPA regarding a student's 
likelihood of success. For undergraduate students, two factors have shown to be the best 
predictors of academic success, HSGPA and the rigor of high school coursework a student takes 
(Adelman, 2004) . Standardized tests should probably be used as a criterion for college entry in 
one and only one situation: vdien a student has not t^en school seriously through high school, 
a high test score may indicates good skills in standard En^ish, which is a vital factor in higher 
education success, and therefore suggest taking a chance on students with low performance in 
high school (e.g. 2.0 GPA). 
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